benchmarks: Initial DirectTrajOpt benchmark suite (Ipopt vs MadNLP) by jack-champagne · Pull Request #75 · harmoniqs/DirectTrajOpt.jl

jack-champagne · 2026-04-24T20:54:00Z

Re-open of #67 (was merged into wrong base, reverted, re-applied).

Summary

Initial benchmarking infrastructure for DirectTrajOpt.jl comparing Ipopt and MadNLP solvers.

Depends on: HarmoniqsBenchmarks.jl

What's included

benchmark/Project.toml — Benchmark environment with HarmoniqsBenchmarks, BenchmarkTools, TestItems, MadNLP
benchmark/benchmarks.jl — Three @testitem suites:
- Evaluator micro-benchmarks (bilinear N=51)
- Ipopt vs MadNLP macro-benchmark (bilinear N=51)
- Memory scaling study — sweep N ∈ {25, 51, 101} × state_dim ∈ {4, 8, 16}
.github/workflows/benchmark.yml — CI workflow

Test plan

Run benchmark suite — all 3 benchmarks produce JLD2 artifacts
Verify Ipopt vs MadNLP timing numbers
CI green

🤖 Generated with Claude Code

codecov · 2026-04-24T21:22:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…pt-initial" This reverts commit 83da076.

- Add docs/src/benchmarks.md mirroring CuQuantum.jl pattern: problem description, result tables, environment info, reproduction instructions - Wire benchmarks page into docs/make.jl - Extract shared problem constructors to benchmark/problem_utils.jl (make_bilinear_problem, make_scaled_problem) to eliminate duplication - Remove hardcoded package_version, pass runner consistently - Update README with docs link and regression detection example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove fragile Base.loaded_modules_order lookup for MadNLPSolverExt; MadNLPOptions is exported directly from DirectTrajOpt - Change MadNLP print_level from 1 (TRACE) to 6 (ERROR) to match Ipopt's silent output and avoid polluting benchmark stdout - Derive state_dim/control_dim from problem_dims(prob) instead of hardcoding in micro-benchmark MicroBenchmarkResult - Set BENCHMARK_RUNNER=github-actions in CI workflow so results are distinguishable from local runs - Remove unused ForwardDiff and LinearAlgebra from benchmark/Project.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Pin HarmoniqsBenchmarks source to fix/widen-dto-compat branch which has DirectTrajOpt compat "0.8, 0.9" (fixes Pkg.instantiate failure) - Run JuliaFormatter on benchmark files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…merge The compat-widening branch has merged to HBJ main, so point the benchmark Project.toml at main. HBJ is not yet registered in General, so we still need the explicit URL+rev source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The uncommented top-level `Base.loaded_modules_order` lookup and the `for seed = 0:99` wins-counting loop are scope creep from the original PR — the file is not picked up by `@run_package_tests` nor referenced anywhere, so the only visible effect of uncommenting was a future landmine: the top-level lookup will throw `BoundsError` if MadNLPSolverExt is not loaded, and the loop contains `exit(1)` which would kill the test runner outright. Restore the original commented-out state. If we want to revive this as a real test later, wrap it in `@testitem` with `@test err < 1e-3` instead of `exit(1)`.

`contains(ti.filename, "benchmark")` would silently exclude any future testitem whose filename happens to contain "benchmark" (e.g. `test/regressions/hessian_benchmark.jl`). Switch to a path-component match via `splitpath` so only files actually under a `benchmark/` directory are excluded.

Tracking `rev = "main"` makes benchmark CI results non-reproducible: a force-push or follow-up commit on HBJ silently changes the benchmark schema for every old commit's CI re-run. Pin to the v0.2.0 prep tip (5401542c — convergence schema + extractors), bump as needed when we adopt new HBJ helpers. Drop in favor of `[compat]` once HBJ registers in General.

`make_scaled_problem` accepted `seed::Int = 42` but the scaling sweep never passed one — so every cell used seed 42, which means the timing table was effectively measuring nine variations of the same random instance rather than a sweep across instances. Generate a deterministic per-cell seed (`1000 + 100*N + dim`). Both solvers still receive the same instance for that cell so the Ipopt-vs-MadNLP comparison stays apples-to-apples; only across cells does the underlying problem change. This is still single-shot per cell — a follow-up could sweep K seeds and report median if we want statistical robustness.

The bare `try … catch end` blocks for `pkg_version` and the inline `try … catch; ; "unknown" end` for the git SHA both hid real failures silently (and the latter had a syntactically dubious `;` continuation). Capture the exception with `@warn` so CI surfaces a "Failed to look up …" message if the lookup ever breaks, rather than letting downstream results carry "unknown" with no explanation. Also hoist the git rev-parse into a named local for readability.

First-call JIT compile (Ipopt/MadNLP extension load, KKT/AD codegen) dominates wall-time on small problems and biases the Ipopt-vs-MadNLP comparison by solver order — whichever solver runs first pays the MOI extension JIT that the second avoids. Add a tiny `max_iter = 2` warmup solve per solver before the timed section in both the macro-benchmark and the memory-scaling sweep. Each @testitem runs in its own Julia process, so the warmup needs to be inside the testitem (compilation cache is per-process). This makes the reported wall-times steady-state numbers rather than "cold start + a few iterations of solve work".

The numbers in the benchmark page were single-run local samples that got captured into the docs build verbatim. Without a workflow wiring real CI artifacts into the docs, presenting them as bolded headline claims ("MadNLP is 33% faster", "40–45% less memory") would let them become the canonical DirectTrajOpt figures people quote — which is misleading because they shift with hardware, BLAS, MUMPS, Julia version, and even solver order before the warmup fix. - Add an upfront note that the tables are example output showing the schema of each benchmark, not pinned reference measurements. - Remove the "33% faster / 43% fewer / 40–45% less / 30–35% faster" bolded comparison sentences. - Reframe the scaling table around what it's actually good for: each solver vs itself over time, not a single-shot solver comparison. Real numbers should come from the CI workflow's JLD2 artifacts. Wiring those into the docs build is a follow-up.

JuliaFormatter line-broke the `DirectTrajOpt.solve!(...; options=...)` calls in the warmup blocks. Apply the formatter pass so the Formatter CI check stays green.

The convergence suite under `benchmark/convergence/` uses `@testitem` blocks that depend on `HarmoniqsBenchmarks` — which isn't in the package's main test environment. `@run_package_tests` was picking those up and failing with `ArgumentError: Package HarmoniqsBenchmarks not found in current path` on Julia 1.10/1.11/1.12. Filter out anything whose path includes a `benchmark` component. Match the path component (via `splitpath`) rather than a substring so future test files like `foo_benchmark.jl` aren't accidentally excluded. Mirrors the same filter added in PR #75 (initial benchmark suite).

MadNLP's `print_level` is a LogLevels enum where 0 isn't a valid value (`ArgumentError: invalid value for Enum LogLevels: 0`). The timed benchmark calls in this file already use `print_level=6` (silent at ERROR level) — make the warmup solves match. Caught by the benchmark CI workflow.

Single-shot timing on random instances was noisy enough to be misleading. The previous CI run on this PR produced one anomalous cell: N=25, dim=8: Ipopt 0.011s / 22KB alloc ← almost certainly an iter-0 termination on a seed-degenerate initial point The seed for that cell was `1000 + 100*25 + 8 = 3508`. Other cells trended roughly as expected with problem size, but cell-to-cell single shots are noisy enough that the comparison numbers aren't trustworthy (e.g. MadNLP at 25×16 took 29s but at 51×16 took 39s; MadNLP at 25×4 was slower than at 51×4). Switch to median over `n_seeds = 3` independent random instances per (N, dim) cell: - Seed scheme: `1000 + 100*N + dim + 10_000*(k-1)` for k ∈ 1:3. Per-cell distinct, per-sample distinct. - Both solvers receive the *same* instance for each (cell, seed) so per-seed Ipopt-vs-MadNLP comparisons stay fair. - All 3×K BenchmarkResults per cell are saved to JLD2 — the printed table shows medians but the raw distribution is preserved for downstream analysis. Adds Statistics to benchmark/Project.toml for `median`. Runtime cost: ~3x the previous scaling-sweep section, which puts the whole benchmark workflow at ~15 min wall time (previously ~9 min, still well under the workflow's default 6h timeout). Docs page (`docs/src/benchmarks.md`) updated to describe the median-of-K scheme and note that per-seed results are persisted to the JLD2 artifact.

jack-champagne mentioned this pull request Apr 25, 2026

fix: widen DirectTrajOpt compat to 0.8, 0.9 harmoniqs/HarmoniqsBenchmarks.jl#2

Closed

jack-champagne and others added 5 commits May 15, 2026 16:18

Reapply "Merge pull request #67 from harmoniqs/benchmarks/directtrajo…

7f3ff90

…pt-initial" This reverts commit 83da076.

jack-champagne force-pushed the benchmarks/directtrajopt-initial-v2 branch from 8d3ff5f to ff9d6f6 Compare May 15, 2026 20:21

jack-champagne changed the base branch from feat/madnlp-gpu-passthroughs to main May 20, 2026 05:01

jack-champagne added 8 commits May 20, 2026 01:03

chore: apply JuliaFormatter to warmup blocks

d770528

JuliaFormatter line-broke the `DirectTrajOpt.solve!(...; options=...)` calls in the warmup blocks. Apply the formatter pass so the Formatter CI check stays green.

jack-champagne mentioned this pull request May 20, 2026

bench: Ipopt + MadNLP allocation profile script #71

Closed

3 tasks

jack-champagne mentioned this pull request May 20, 2026

benchmark: alloc profile testitem (Ipopt + MadNLP) #95

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks: Initial DirectTrajOpt benchmark suite (Ipopt vs MadNLP)#75

benchmarks: Initial DirectTrajOpt benchmark suite (Ipopt vs MadNLP)#75
jack-champagne wants to merge 15 commits into
mainfrom
benchmarks/directtrajopt-initial-v2

jack-champagne commented Apr 24, 2026

Uh oh!

codecov Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jack-champagne commented Apr 24, 2026

Summary

What's included

Test plan

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Apr 24, 2026 •

edited

Loading